Data Analysis Project: Σ-Optimality for Active Learning on Gaussian Random Fields

نویسندگان

  • Yifei Ma
  • Jeff Schneider
  • Barnabas Poczos
  • Roy Maxion
چکیده

A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1− 1/e) globally optimal solution. However, L loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Σ-Optimality for Active Learning on Gaussian Random Fields

A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1− 1/e)...

متن کامل

Submodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields

Many real-world datasets can be represented in the form of a graph whose edge weights designate similarities between instances. A discrete Gaussian random field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior covariance is the inverse of a graph Laplacian. Minimizing the trace of the prediction covariance Σ (V-optimality) on GRFs has proven successful in batch active learn...

متن کامل

Active Search and Bandits on Graphs using Sigma-Optimality

Many modern information access problems involve highly complex patterns that cannot be handled by traditional keyword based search. Active Search is an emerging paradigm that helps users quickly find relevant information by efficiently collecting and learning from user feedback. We consider active search on graphs, where the nodes represent the set of instances users want to search over and the...

متن کامل

On Temporal Evolution in Data Streams

The future of CiteSeer : CiteSeer[superscript x] p. 2 Learning to have fun p. 3 Winning the DARPA grand challenge p. 4 Challenges of urban sensing p. 5 Learning in one-shot strategic form games p. 6 A selective sampling strategy for label ranking p. 18 Combinatorial Markov random fields p. 30 Learning stochastic tree edit distance p. 42 Pertinent background knowledge for learning protein gramma...

متن کامل

Factors Influencing Robustness and Effectiveness of Conditional Random Fields in Active Learning Frameworks

Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in ter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014